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Abstract — We propose a new model for peer-to-peer network- 
ing which takes the network bottlenecks Into account beyond the 
access. This model can cope with key features of P2P networking 
like degree or locality constraints together with the fact that 
distant peers often have a smaller rate than nearby peers. 

Using a network model based on rate functions, we give a 
closed form expression of peers download performance In the 
system's fluid limit, as well as approximations for the other cases. 
Our results show the existence of realistic settings for which the 
average download time Is a decreasing function of the load, a 
phenomenon that we call super-scalablllty. 

I. Introduction 

The Peer-to-Peer (P2P) paradigm has been widely used to 
quickly deploy low-cost, scalable, decentralized architectures. 
For instance, the success of BitTorrent [l] has shown that file- 
sharing can be provided with full scalability. Although many 
other architectures currently compete with P2P (dedicated 
Content Distribution Networks, Cloud-based solutions, . . . ), 
P2P is still unchallenged with respect to its low-cost and 
scalability features, and remains a major actor in the field of 
content distribution. 

Today, the main limitation for P2P content distribution 
is probably the access upload bandwidth, as even high- 
speed Internet access connections are often asymmetric with 
a relatively low uplink capacity. Therefore most P2P content 
distribution performance models assume a relatively low access 
bandwidth as the main performance bottleneck. However, in 
a near future the deployment of very high speed access (e.g. 
FTTH) will challenge the justification of this assumption. This 
raises the need of new P2P models that describe what happens 
when the access is not necessarily the main/only bottleneck and 
that allow one to better understand the fundamental limitations 
of P2P. 

A. Contributions 

A new model. The first contribution of the present paper 
is the model presented in Section III which features the 



following two key ingredients: 1) a spatial component thanks 
to which the topology of the peer locations is used to determine 
their interactions 2) a networking component allowing one to 
represent the actual exchange throughput between peers. 

A promising form of scalability. In most P2P bandwidth 
models, the upload/download capacity is the bottleneck deter- 
mining the exchange throughput obtained by peers JS), |[3l, m. 
This creates scalability, where the download latency remains 
constant when the system load increases. Our new model 
exhibits a stronger form of scalability, which we call super- 
scalability, where the service latency actually decreases with 
the system load. 



We show in Sections |ll] and |IV] that super-scalability is a 
consequence of network dynamics causing the service rate of 
a typical customer to increase with the load of the system. 

Conditions for super-scalability to hold. One may ques- 
tion the realism of such a model, as the underlying network 
obviously cannot sustain arbitrarily high rates. Section |V] 
combines our model with an abstract (physical) network model 
to determine the conditions for which our model makes sense 
and super-scalability occurs. 

Another natural issue is data availability: bandwidth can 
be a bottleneck only if peers have something to transmit to 
each other. We address this issue in Section |VI] where we 
study the impact of data availability on the effective download 
performance. 

The laws of super-scalability. Starting from the basic 
model studied in Section |IV[ we build in Section |VII| a Swiss 
Army Knife for handling many realistic variants: generic rate 
functions, auxiliary servers, seeding behavior of users, access 
bottleneck conditions. . . The corresponding laws determine op- 
timal tuning of the parameters of the P2P algorithms e.g. 
peering degree, transport protocol or seeding times. 

B. Related Work 

Our main scenario is inspired by a BitTorrent-like file- 
sharing protocol. In BitTorrent |1|, a file is segmented into 
small chunks and each downloader (called leecher) exchanges 
chunks with its neighbors in a peer-to-peer overlay network. A 
peer may continue to distribute chunks after it has completed 
its own download (it is then called a seeder). Here is a short 
summary of what is kown on this scenario. 

Bandwidth-centered modeling. Some studies have an- 
alyzed the effectiveness of P2P file-sharing with a simple 
dynamic system model of peer arrival, focusing on the perfor- 
mance under the assumption that the access bandwidth is the 
main bottleneck [2|, |3|, [4|. While the present paper focuses 
on a similar bandwidth-centered approach, it introduces a 
richer family of peer interaction models. 

Chunk availability. Another potential bottleneck is chunk 
availability. The worst possible case is the "missing piece 
syndrome" |5|, where one chunk keeps existing in only a 
few copies (or none!) and the peer population can grow 
unboundedly while trying to get that chunk. The syndrome 
may happen for some scenarios |6|, |7|, but it can be avoided 
by using more or less sophisticated download policies, at the 
cost of somewhat increased download times, see |6|, |7|, ['8l, 
|9|, |10|. Also note that |11| proposed an elegantly abstracted 
stochastic chunk-level model of uncoordinated file-sharing. 
The results in 1 1 1 1 indicate that if the system has high input 
rate and starts with a large and sufficiently balanced population 



of chunks, it may perform for a long time without missing 
chunk even if there is no seeder 

In this paper, we assume that missing chunk issues are 
avoided by some mechanism (hke getting the locally rarest 
chunk with high priority), so the impact of chunk on perfor- 
mance is reasonable. Nevertheless, we estimate this impact 
through a very simple chunk-level modeling, inspired by the 
ones proposed in |3| and fTTl. 

Spatially-dependent rate. While a large number of studies 
consider the case of heterogeneous rates, to the best of our 
knowledge, none considers a system where the transfer speeds 
depend on pair-wise distances but not on the nodes as such. 
There are some earlier papers considering P2P systems in a 
spatial framework (for instance, |12|), but they do not assume 
that distance has some effect on transfer speed. Our paper 
seems to be the first where a peer's downloading rate is a 
function of its distances to other peers. 

II. Super-scalability Toy Example 

Before getting into the core of the paper, consider a system 
in steady state where peers arrive with some arrival intensity A, 
download some file of size F and leave the system as soon as 
their own download is completed. We neglect here geometry 
as well as chunk availability issues. By the latter we mean that 
a peer has always a chunk to provide for another, unfinished 
peer. 

Suppose that the access upload bandwidth is the main 
bottleneck. If U is the typical upload bandwidth of a peer, 
then it makes sense to assume that U is also the typical 
download throughput experienced by each peer. In particular, 
in the steady state (if any), the mean latency W and the average 
number of peers N should be such that 

W = - and N = XW = (Little's Law). (1) 

Although very simple, ([T]l contains a core property of standard 
P2P systems: the mean latency is independent of the arrival 
rate. This is the scalability property, one of the main motiva- 
tions for using P2P. 

Now, imagine a complete shift of the bottleneck paradigm. 
Let the main resource bottleneck be the (logical, directed) 
links between nodes instead of the nodes themselves. We 
should then consider the typical bandwidth U from one peer 
to another as the key limitation. If each peer is connected 
to every other one (the interaction graph is complete at any 
time), then Equation ([TJ should be replaced hy W = 
and N — XW, which leads to 
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Now, the service time is inversely proportional to the square 
root of the arrival intensity: this is super-scalability. 

Remark 1: In fact, the real solution is a little bit more 
complex than that due to size fluctuations that have not been 
taken into account here. A more rigorous description of the 
toy model is available in KT3\l . 
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In this toy example, the central reason for super-scalability 
is rather obvious: the number of edges in a complete graph is 
of the order of the square of the number of nodes, and so is 
the overall service capacity. 

The main question addressed in the present paper is to 
better understand the fundamental limitations of P2P systems 
and in particular to check whether super-scalability can possi- 
bly hold in future, network-limited, P2P systems, where the 
throughput between peers will be determined by transport 
protocols and network resource limitations rather than the 
upload capacity alone. This requires the definition of a new 
model allowing one to capture the toy model idea while taking 
into account the limitations inherent to P2P overlays as well 
as network capacity constraints. 

III. Network Limited P2P Systems 

The aim of this section is to define a basic model that tries 
to capture super-scalability, spatially dependent rates and P2P 
constraints. This model will be extended in the last sections 
of the paper. 

Spatial domain. Our peers live in a domain D equipped 
with a distance d. The meaning of d can be manyfold: physical 
distance; latency-based pseudo-distance |14|; D can even be 
some representation of peer categories, the position of a peer 
representing its own centers of interest. The main point is that 
we assume that the rate between two peers depends on their 
distance in D. For simplicity, we focus on a basic model where 
D is an arbitrarily large torus that approximates the Euclidean 
plane M^, but there is no basic difficulty in extending this 
framework to other topologies better suited to model networks, 
like a hyperbolic space ifTSl . Distances in D are expressed in 
meters, regardless of the actual meaning of D. 

Arrival rate. We assume that new peers arrive according to 
a Poisson process with space-time intensity A ("Poisson rain"). 
The parameter A, expressed in m~^ ■ s~^, describes the birth 
rate of peers: the number of peer that arrive in a domain of 
surface A (expressed in m^) in an interval [s, t] (in seconds) 
is a Poisson random variable with parameter XA{t — s). 

Data rate. For our basic model, we assume that the transfer 
rate is determined by a congestion mechanism like TCP Reno. 
On the path between two peers, let i9 denote the packet loss 
probability and RTT the round trip time. Then the square 
root formula |fT6l stipulates that the rate obtained on this 
path is 5. r^ , with ^ =^ 1.309. Assuming the RTT to be 
proportional to distance r yields a ti'ansfer rate of the form 
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where C is a rate parameter expressed in bits ■ s 



We assume that the rates are additive, so that the total 
download rate of a peer x is 
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where N{x) is the set of neighbors of x (in the overlay) and 
d{x,y) the distance between x and y. 

We consider symmetric connections, because: the data rate 
function is symmetric; chunk availabihty may be neglected 
for proper parameters (see Section [Vl|; some tit-for-tat mech- 
anisms may be at play to enforce some kind of reciprocity 
between peers. By symmetry, ^{x) is also the upload rate of a 
peer at a;. In order for the access not to be a further limitation, 
the access capacity of a peer at x should exceed ij,{x). This is 
our default assumption here (access as a possible bottleneck 
is considered in Section I VIII). 



The choice of a rate function given by ([3| is mainly for 
giving explicit results based on a simple distance-varying rate. 
Our results indeed apply for a wide range of rate functions (cf. 
Section |VII-A| i. 

Data size. Each peer p wants to get an amount Fp > of 
data. In the basic BitTorrent example where every peer wants 
to get the same file, Fp would most naturally be modeled by a 
constant F (the size of the file). For the sake of mathematical 
tractability, in the analytical models, we follow the approach 
used by ||J| and assume that the Fp's are independent and 
identically distributed random variables, with finite expectation 
F = E(Fp). 

Unaltruism. When a peer has finished its download, it 
leaves the system immediately (instead of becoming a seeder). 

Connectivity limitation. The toy example assumes full 
mesh connectivity between peers, which is not a reasonable 
assumption. In practice, peers usually limit their neighborhood 
by using some overlay graph. There are many ways to build 
an overlay, for instance by selecting only peers with sufficient 
qualities and/or by limiting their total number of neighbors. 
In the basic model, we propose to define connectivity by a 
range R: if $t is the set of peers present at time t, then 
Nt{x) = {y ^ ^t,y y^ X, s.t. d{x, y) < R}. The range can for 
instance originate from an ALTO-like connection management 
that prevents peers too far from one another to connect |17|. 
This constraint is even more meaningful in a wireless context, 
as it can represent the transmission range. 

Other connectivity rules could be enforced, for instance 
random connectivity, but if the rate function decreases with the 
distance, it is only natural to enforce proximity in the overlay 
graph. Later in the paper (Section |VII[ ), we propose another 
proximity-based variant where a constant number of closest 
peers is selected. 

Chunks. In order to focus on bandwidth aspects, the basic 
model follows the approach proposed by |3|: we assume that 
the effect of chunk (un)availability between peers is that the 
download effectiveness is affected by some factor 77 < 1. In the 
following, we omit 77 by assuming that file sizes are virtually 
scaled by a factor - . The actual value of ?; will be investigated 
in Section NB 



IV. Study of the Basic Model 

In this section, we give some theoretical results for the 
basic model when D is a subdomain of the Euclidean plane 



(or a two dimensional torus). We only give here the key ideas 
that explain the results. Detailed proofs are available in ||T3| . 

A. Steady State 

The system's dynamics belongs to the class of spatial birth 
and death processes ifTSl . The births are the peer arrivals 
described above. The death rate of a peer at x is ^{x)/F 
with fi{x) given by formula (H. The first result is about the 
stability of the system: 

Proposition 1: If the domain D in which the peers live 
is compact, then the spatial birth and death process (i.e. the 
positions of peers present at time t) forms a Markov process 
which is ergodic for any birth rate A > 0. 

The proof of Proposition [T] is based on a domination 
argument. The claim also holds in M? but requires a more 
sophisticated proof that will appear in a forthcoming paper. 

According to Proposition T] the model admits a steady state 
regime where the peers (in the basic model all leechers) form 
a stationary and ergodic point process in D iT9l . 



We denote by /3o the density of the peer (leecher) point 
process, by /j.^ the mean rate of a typical peer, by Wo the 
mean latency of a typical peer, and by No the mean number 
of peers in a ball of radius R around a typical peer, all in the 
steady state regime of the P2P dynamics. 

In the following, we will also consider several approxima- 
tions of the main model: 

• a fluid regime/limit, where the corresponding quantities 
will be denoted by a subscript / (e.g. /3/); 

• a heuristic description with a hat notation (e.g. /3o) 

In any of these regimes. Little's law tells us that the average 
density verifies /3 = \W . 

B. Fluid Limit 

The fluid limit consists in assuming that, in the steady state 
regime, peers are distributed according to an homogeneous 
Poisson point process in D such that the mean number of 
neighbors of any peer is large. In particular, in the fluid limit, 
the presence of a single peer at a given point does not impact 
the distribution of the other peers. 

From Campbell's formula fT9l, the mean total rate of a 
typical location of space (or of a newcomer peer) is then 

^f = I3j2n / {C/r)rdr = /3/27rCi?. (5) 

Jr=0 

Now, the fluid limit assumes that a peer sees ^f during its 
whole lifetime. We get that the mean latency of a peer is 

Wf = —. (6) 

Using Little's law, one gets 

Pf^if = \F. (7) 

From (|5]), (|6]l and (|7|, we have 
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As we see in the expression for the mean latency in (|8]l, 
the fluid limit exhibits the same super-scalability as the toy 



example: in spite of the fact that the interactions are hmited in 
range and depend on the distance, the mean latency decreases 
in A= when A tends to infinity and everything else is fixed. 

Note that in the fluid limit, the mean number of peers in a 
ball of radius R around a typical peer is 
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C. Dimensional Analysis 



At this point of the paper, the fluid limit is a thought exper- 
iment, not necessarily related to the actual model. Dimensional 
analysis li20J helps to connect the two. 

In the basic model, the system has 4 parameters (the range 
R, the file size F, the peer arrival rate A and the rate parameter 
C) expressed in 3 basic physical units (meters, bits, seconds). 
The TT- theorem |20| allows us to strip the problem from all its 
parameters but one. The idea is that the behavior of a system 
is not affected by the physical units used to measure it. By 
using proper unit changes |13|, the system can be described 
by just one dimensionless parameter 

The TT-theorem leaves some freedom in the choice of the 
parameter By noticing that Nf — \f\\fp, we can use N f, 
which has a physical interpretation (the number of neighbors 
predicted by the fluid limit), instead of p. 

The TT-theorem tells us that all systems that share the same 
parameter N f are similar. Now consider the union of two 
independent systems that use the same parameters (A, F, C, 
R)\ the real model, with latency Wo, and the fluid model, with 
latency Wf. The ratio ^ is a dimensionless property of the 
overall system, therefore it is a function of N f only. In other 
words, there exists a dimensionless function M{Nf) such that: 

Wo = M{Nf)Wf. (11) 

From Little's law, we also deduce the density: 

/3o = PfM{Nf). (12) 

Note that the dimensional reasoning made on the basic 
model can be extended to other models, for instance with 
different rate functions or connectivity rules. Equation ( [T2] i 
will remain true, although the shape of M may change; 
in particular, if the system is described by more than 4 
parameters, M may depend on more than one variable. 

To summarize, although the system in the basic model 
may be subject to complex interactions and is defined by four 
independent parameters, dimensional analysis allows one to 
express its general behavior through a one-parameter function 
M (unknown at this point), which expresses how far the actual 
system is from its fluid limit. 

D. Fluid as a Bound 

We now give a better understanding of the behavior of the 
real system through the following theorems. 

Theorem 1 (Fluid as a bound): M > 1. In other words, 
the fluid regime is actually a lower bound for the mean latency 
and the peer density. 



The proof comes from a stochastic intensity argument. This 
property stems from the fact that as a peer uploads content to 
its neighbors, it makes them leave the system faster than if it 
did not upload anything. This is called a repulsion effect. As a 
result, the mean download rate experienced by a typical peer 
(Palm distribution) is less than the mean download rate that 
would experience a virtual, non uploading, peer located at a 
typical location of D. Details can be found in |13|. 

Theorem 2 (Fluid as a limit): When Nf goes to infinity, 
M goes to 1, and the law of a typical peer latency converges 
weakly to an exponential random variable with parameter 

l/Wf. 

Theorem l2] says that the fluid bound is tight: when the 
number of neighbors predicted by the fluid limit tends towards 
infinity, the system behaves like its fluid limit. 

The idea of the proof is that, when Nf tends to infinity: 
(i) the traffic is high enough for the impact of one given peer, 
and thus the repulsion effect, to be neglected; (ii) the peers 
stay long enough to make the fluctuations slow and weak. The 
fact that the rate at any point is constant in the limit implies 
that the latency is exponential in the limit. 

E. Heuristic 

For arbitrary values of Nf, we propose to approximate M 
by M, the unique solution in [l,oo) of 



M^ 



1- 




= 1. 



(13) 



In order to derive ( [T3] ), we use a heuristic factorization of 
the factorial moment measure of order 3 of the stationary peer 
point process (see l,19J for the definition of these measures) 
which is described in ifTSJI . Informally, the method consists 
in computing an approximation Uo of the average rate of a 
peer assuming that: (i) a neighbor at distance r from that peer 
"sees" a rate Uo+—', (ii) in return, the peer "sees" at distance 
r a density of neighbors . ^^c (using (|7]l). 

This heuristic is in fine with Theorems [T] and |2l 

Remark 2: When Nf goes to 0, the system admits another 
limit, called hard-core, which was not presented here due to its 
lack of interest for real P2P systems. Nevertheless, the heuristic 
is in line with the hard-core limit too, which predicts that M 
behaves like -^ when Nf goes to 4751/ . 

F. Validation 

We validated and substantiated our results by means of 
simulations of our model. We used a discrete time simulator 
to evaluate the basic model for several values of Nf (see ifTsl 
for details). Key results are displayed in Figurefl] which allows 
us to check almost all results of this section in one look: 

• 7\f = 1 is a lower bound of the actual system (Theorem [T]); 

• as Nf goes to cxo, the bound becomes tight (Theorem l2]i; 

• the heuristic ( [T3] l gives a good approximation of M; 

• as Nf goes to 0, the system behavior converges towards the 
hard-core limit M = ^ (cf. Remark 2). 

We also checked that for Nf big enough, it is quite difficult 
to distinguish the system from a spatial birth and death process 
with birth parameter A and death parameter \/Wf, namely a 
Poisson point process of intensity /3/ (cf. ifTSJI ). 
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Fig. 1. M(Nf) in ttie basic model. 



V. Network Capacity Constraints 

Super-scalability naturally rises the question of the burden 
on the underlying network. The aim of this section is to 
determine the capacity required for the network elements in 
order to achieve the super-scalable regime identified above. 

So far, the only assumptions on the network were that 1) 
the access is not the (only) bottleneck; 2) the network is a 
bottleneck, resulting into a transfer rate between peers that 
depends on their distance. 

This section introduces an abstract network model on 
which the P2P traffic will be mapped through some natural 
shortest path routing mechanism. We determine the mean flow 
that traverses a typical network element. This flow of course 
depends on the protocols used in the network which in turn 
determine the bit rate function. 

For simplicity, we consider the fluid limit of the system. 

A. Network Model 

We consider an underlying network made of routers and 
links between them where 

• routers form a realization of a spatial Poisson point 
process of intensity 9; 

• links are the Delaunay edges (see e.g. 11211 . Chapt. 4) on 
this point process; 

• the capacity of a link is E; 

• each peer is directly connected to the closest router and 
the path between two routers is a minimal path (with 
respect to hop count) on the Delaunay graph. 

In this case, the number of links between two peers is 
asymptotically proportional to the distance between them II2TII . 

Consider a straight line of the plane of length /. The average 
number of links that go through the line is 21 V9, so the 
maximal traffic that can cross the line is 2El\/9. In other 
words, S := 2^/OE is a parameter that describes the capacity 
of the network, expressed in bits ■ s^^ ■ m^^. 

B. Flow Equations 

Let ^(e) denote the mean value of the P2P traffic that 
goes through a segment S of length e in the fluid regime. By 
isotropy, we can focus on S* = [(0, — |), (0, |)]. 



A simple stochastic geometry argument shows that 

* = *(1) = 4/3^ / r^f{r)dr 
Jo 

(see lfT3l ). Using the fluid expression of the density 

/3/ = 



(14) 
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we get the key relation 
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Equation ( [T5| ) holds for an arbitrary rate function /. For /(r 



(15) 



we get 



* = 2C/3^eR^ = -XFR. 



(16) 



C. Feasibility Condition 

Now, in order to simplify the evaluation of the P2P load 
on the underlying network, we assume that (a) 9 is large 
enough so that the hop-count between two peers can be seen 
as proportional to their distance and the flow between them as 
a straight line; (b) Any rate smaller than El can be transported 
through a segment of length I. Under these assumptions, the 
condition for the network to sustain the rate generated by our 



model is 



*<S. 



(17) 



Note that the flow ^ in ([16]) does not depend on C, so that 
condition ^TT) does not either. This surprising result means that 
in the fluid limit, we can arbitrarily scale the individual rate of 
connections (thus decreasing the latency) without changing the 
burden on the underlying network. Of course, there is a flaw 
in that reasoning: increasing C eventually impairs the validity 
of the fluid limit. As C increases, Nf gets smaller so we tend 
to leave the fluid limit and the approximations we used do not 
apply anymore lfT3l . 

VI. Adding Chunks to the Model 

This section contains a mathematical model and a sim- 
ulation study allowing one to quantify the impact of chunk 
availability. An important result is that when both the number 
of chunks and the parameter Nf (introduced in Section [rvj) are 
large, then the systems behaves as the chunkless fluid model 
of Section HyI 

A. Chunk Modeling 

We assume now that the file has a constant size F and is 
divided into K chunks of equal length. At any time, a peer is 
characterized by its collection, which is the subset of chunks 
it fully possesses. With respect to dimensional analysis, the 
system is now described by two parameters: Nf and K. 

For simplicity, we focus on the steady state taken in its 
fluid limit with respect to the peers, and we assume that the 
chunk scheduling policy is based on the following principles: 

• rarest chunk first: when a peer can choose between 
chunks to download, it selects the one with fewest copies in 
its neighborhood; as in |[3|, we assume that this prevents the 
missing chunk syndrome and ensures that a peer with k chunks 
has a collection of chunks which is independent of that of the 



other peers and uniform on the subsets of cardinaHty k of the 

set {l,...,K}; 

• random peer order, when it can download a given chunk 
from many neighbors within its range, a peer chooses one at 
random (the scheduling is not network-aware). 

There are two main ways to manage the download of 
simultaneous chunks: in the one-to-one model, a peer gets 
one chunk from a single neighbor, while in the many-to-one 
model, it can aggregate the resources of all neighbors that 
possess that chunk. The many-to-one approach gives better 
theoretical performance, as we will see below, but it requires 
a tight synchronization between peers that collaborate for a 
chunk, and thus may require an additional overhead in practice. 

B. Performance Study 

An exhaustive study would require to consider the 2^ — 1 
possible collections (although seeders are initially needed to 
bootstrap the system, we still consider a steady state with 
no seeder, so there is no full collection). With the proposed 
assumptions, the impact of chunks mainly depends on the 
number of chunks akeady possessed by the peers. We say 
that a peer belongs to class k, for < /c < iiT — 1, if it 
possesses exactly k complete chunks. The following theorem 
gives the performance of each class in the fluid regime (by fluid 
regime, we mean i) a chunk regime where the independence 
and uniformity assumptions described above on the distribution 
of the chunks hold and ii) a peer regime where the Poisson 
assumptions described in the preceding section hold). 

Theorem 3: In the fluid limit, the mean total download rate 
of a peer of class k, < k < K, is 

A*fc = VkfJ'f, (18) 

where /i/ is given by ([8]). Equation ( p2| ) gives the r^fe's for the 
many-to-one scheduling while ( p4j i gives a lower bound for 
the one-to-one case. 

Proof: In view of our assumptions on the scheduling 
and on the distribution of peers, the average rate of a 
given transfer is just the average over the range, that is 

^/„^27rKC/r)dr=f. 

Now, we consider a peer p of class k with a neighbor 
q of class j. In view of our assumptions on the distribution 
of chunks, the probability that q has at least one chunk that 
p wants, which coincides with the probability that the set of 
chunks of q is not included in that of p, is 

^(fc,j) = l-Q)/Q), (19) 

with the convention that ( ) = for j > k. Thus, if (3j denotes 
the density of class j, the number of neighbors from whom a 
given peer of class k may download one chunk is 

iV, = 7ri?2^/?,z(fc,j). 



(20) 



In the many-to-one model, we deduce that the average 
download is on ^^^ 

'2^/3,z(fc,j). (21) 



K ^ — ' 

J=0 



We notice then that for class k, (|7]i becomes (3^ = ^^■ 
To conclude, we define rjk :— ^^, where /ij is given by ([8]). If 



we replace /3fe by -^^ in ( |2T| and use the relationships from 
^ and Q, we get 

Vk^i^y] ^^^^^- (22) 



1 ^^(fc,j) 






m 



In the one-to-one model, a peer cannot download a chunk 
from more than one peer In the worst case where each of the 
Nc. peers has at most one of the desired chunks, the probability 
that p can download any given desired chunk is 1 — (1 — 
■j^z:^^''^ so that the average number of chunks downloaded 
is 



{K^k)\\-{\- 



1 



K -k 



N, 



(23) 



Adapting ( pT) , using the same variable changes as for the 
many-to-one case, and using Nf as a lower bound for Nc, one 



gets: 



Vk 



> 



K 



N 



f 



1-(1- 



K -k 



Nf 



(24) 



Equation ( p2| ) is easily solved using fixed-point iterations. 
Notice that the computation depends solely on K in the many- 
to-one model and on K and Nf in the one-to-one model. If 
1] denotes the harmonic mean of the rj^'s, we verify that the 
overall latency W is — - . Therefore, as for the model proposed 
in f3|, r] can be used to scale the results of the basic model 
and ignore the underlying, possibly complex, chunk exchange 
mechanisms. 

Remark 3: In the basic model we had W = M{Nf)Wf, 
so we can interpret - as M{Nf, K) in the case Nf 3> 1. 

We now study the behavior of rj in the fluid limit. 

Theorem 4: In the many-to-one model, and in the one-to- 
one if Nf is large enough yet fixed, we have 



V 



K^oo 



-^ 1. 



(25) 



Sketch of Proof: For the many-to-one model, we use a 
scaling technique that consists in letting K go to infinity so as 
to make the 77^ converge toward a continuous function in [0, 1). 
The basic ingredient is the fact that the function z defined in 
(fT9i converges pointwise to 1 under this scaling. The scaling 



oT^ 



7j{x) 



1 



:dy- 



(26) 



/o viy) 

It is not difficult to show that 7/ = 1 is the unique positive 
solution solution of this functional equation, which proves i 
for the many-to-one case. 



In the one-to-one model, pSj l is straightforward when 
noticing that 77 is always smaller than or equal to 1 (the overall 
download capacity is lowered because of availability issues). 
The limit of (|24ll when K tends to 00 allows one to conclude. 



The fact that a peer cannot upload a given chunk from 
more than one peer badly impacts the performance of the one- 
to-one model, compared to many-to-one. This is especially 
true at the end of the download, when a peer may have 
more useful neighbors than remaining chunks. This fact was 
empirically observed by Bram Cohen in his original BitTorrent 
design, where he proposed to use one-to-one (which is easier to 
maintain) most of the time except for the very few last chunks, 
where peers switch to many-to-one (endgame behavior [Vj). 



TABLE II. Some rate functions with explicit strength 7 
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Fig. 2. Efficiency r; as a function of K (Nf = 40). 

C. Validation 

We simulate the system with chunks in order to substantiate 
our claims, using a simple rarest first chunk selection and 
random peer selection like the one proposed. Synchronization 
is one-to-one. 

First, we validate the assumption on the distribution of 
chunks by checking the impact of the presence of a chunk 
at some peer on the presence of this chunk at the neighboring 
peers. For instance, for Nf = 40, K = 200, we verified that 
a peer sees in average 29.22 copies of a chunk it possesses 
(itself not included), and 29.10 copies of a chunk it misses. 
This and more detailed correlation analysis (that cannot be 
included here due to space limitation) are quite conclusive. 

We launched many trials to verify our results. Figure |2] 
displays the value of 77 for several values of K. One verifies 
that the system has a better performance than the proposed 
lower bound, and the right behavior when K grows. 

D. Conclusion on Chunks 

We showed (through analysis and simulation) that in the 
fluid limit (A^^ ^ 1), when K ^ 1, the system with chunks 
behaves like the fluid chunkless model of Section |IV] with an 
appropriate efficiency parameter rj, which we described. 

The parameter 77 can be close to 1 if X is large enough, 
with Nf being fixed in the one-to-one model. In this last 
case, super-scalability could be impacted: as A increases, so 
does Nf and if K is fixed, the lower bound converges to 
(simulations confirm that this is also the case for 77). The 
possible workarounds for this issue are: to use many-to-one, 
or equivalently one-to-one with endgame, to get rid of the last 
chunks bottleneck; to limit the number of neighbors in order 
to keep Nf bounded (this will be detailed in Section VII-Fi. 



VII. Extensions of the Basic Model 

The aim of this section is to show that our analysis can 
be extended in several ways and take important practical phe- 
nomena into account. Unless otherwise stated, we will place 
ourselves in the fluid regime, but the dimensional analysis 
approach can be used with all extensions to relate the fluid 
limit to the real system through some function M. As we have 
seen when introducing the chunks, if an extension introduces 
new parameters, AI can be a function of several dimensionless 
variables (replacing Nf). 

For sake of clarity, the proposed extensions are presented 
separately, but interleaving extensions is straightforward in 
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the fluid limit. Outside the fluid limit, the complexity of 
mixed extensions will mainly depend on the complexity of 
the corresponding M function. 

A. More General Rate Functions 

While we focused for the basic model on the rate function 
(|3]l, all our results can easily be generalized to any rate function 
/ such that J^^„ rf{r)dr < oo. 

For a rate function /, the fluid rate Equation (|5]l becomes 



f 

fif = Pf^, with 7 = 27r / rf{r)dr. 

Jr=0 



(27) 



The characteristic 7, which is expressed in bits ■ s~^ ■ m?, is 
the sum of / over its range, so we call it the strength of /. 
Once 7 is known, we can generalize dHl as 



Pf 




^^, ^if = VxF^. Wf^^j^. 



(28) 



We observe that the scaling in -j= still holds. For the rest 
of the paper, we use directly the strength 7 instead of ([3]l. 

Table In] gives the strength of the following rate functions: 

• The TCP-like example of the basic model; 

• Constant rate function, where each flow has a bandwidth 
U . This corresponds for instance to the case where the 
transport protocol is UDP and bandwidth is limited by 
the application; 

• Mix of the above, where the rate is TCP-like with an 
upper bound set by the application; 

• TCP-like with some additive offset q that accounts for the 
mean delay in the two access networks; 

• Capacity of a wireless AWGN channel. 

In most cases, the heuristic approximation M can be 
adapted to /. For instance, a constant / leads to (cf ||13J ) 



M = 




(29) 



If i? = CXI, the system parameter Nf ~ irR'^/Sf is not 
properly defined anymore, which impairs a direct introduction 
of M. If J^ ^ r"^ f{r)dr < cx), a simple workaround is to use 
the following ratio (already considered in ( [T5] l) 



R 



/r>o rf{r)dr 



(30) 



instead of R and to extend the dimensional analysis accord- 
ingly (R being interpreted as the typical range of /). If 
^^^qt"^ f{f')dr = oo, then according to ([T4| the traffic load 
intensity is infinite, so the rate function is probably ill-defined 
with respect to the underlying, capacity-limited, network. 

B. Permanent Servers 

The system may benefit from servers, or eternal seederaM 
For instance they can be introduced to: (i) solve the issue of 
chunk availability by being able to provide any asked chunk; 
(ii) allow to consider hybrid systems that combine classical 
server solutions and a P2P approach; (iii) avoid the fact that 
in our model, the latency goes to oo when A goes to (non- 
popular content syndrome). 

We focus on the basic model. 

The servers are characterized by their density of bitrate Uc, 



expressed in hit ■ s 



Uc 



so that if (3-f is the peer density. 



a typical peer gets H^ from the servers. 

To describe the system, we need another dimensionless 
parameter in addition to Nj. We conveniently choose X •= xf ' 
which expresses the ratio between the density of rate needed 
by the system and the density of rate provided by the servers. 
If X ^ 1' then the permanent rate from servers is sufficient to 
serve the peers, otherwise P2P transfer is needed for stability. 

Let us focus on the two limiting cases: the system is mainly 
client/server (x ^ 1), or the system is mainly P2P with a 
small server-assistance (x ^ 1). The case x ^ 1 can be seen 
as a scenario where servers are here mainly for insuring chunk 
availability. 

If X ^ 1, then almost all resources come from the servers. 
This implies that the point process is hard-core (a peer sees 
almost no neighbor in its range while it is a leecher, otherwise 
the P2P traffic would not be negligible), so a peer can collect 
all the available bandwidth in its range. We deduce the average 
latency: F 



7ri?2[/ 



c 



For X ^ 1, in the fluid limit (Nf ^ 1), we can adapt Q, 



which gives 



A^/^c = /3/,c7 + 



Uc 

Pf,c' 



from which we deduce 

Wf,c-- 



A_ 

A7 



Wf^Jl^^Wf. 



(32) 



(33) 



C. Abandonment 

Here we consider the case where all leechers have some 
abandonment rate. Let a denote this rate. In the stationary state, 
we have A — {jf+a)/3f. From ( p7] i, we deduce fij + HfaF — 
XFj. The positive solution of this equation is 



fif = WAF7- 



aF 



aF 



The analysis can hence be extended without difficulties. For 
instance, the abandonment ratio is given by "^ 



fif+aF ■ 



This is distinct from the case where leechers can seed for some time after 
they complete their download, which is addressed in |VII-E| 



D. Per Peer Rate Limitation 

Due to the asymmetric nature of certain access networks 
(e.g. ADSL), the uplink rate is often the most important access 
rate limitation. Let U denote (here) the average upload capacity 
of a peer; then the average rate in the fluid limit should be such 
that 

fif = y^XF^ < U. (35) 

If 7 = 2ttRC (basic model), a dimensioning rule could be to 
choose R = )^p2TrC ^° ^^'^^ ^^^ available capacity is used. 

E. Leechers and Seeders 

When a leecher has obtained all its chunks, it can become 
a seeder and remains such for a duration Ts- In this setting, 
there is a density of seeders AT5 in the stationary regime. 

In the fluid limit with seeders, (|27]i becomes 

(36) 



M/,s = Wf,s + ATs)7. 
Using (|7]l and F — W/,s/^/,s, we get 

wis + Wf,sTs = W]. 
The positive solution of this equation is 



Wf,s 




2 ■ 



(37) 



(38) 



In particular, we have W/ 5 « Wf for Tg ^ Wf and W/.5 



for Ts > Wf 



Wf 

-j^ tor Is :?> Wf. 

Remark 4: Seeders can also greatly improve the perfor- 
mance in the case where Nf is small, by ensuring that a 
leecher can find peers in its range with high probability (cf 
Hl3il for more details). 

F. Limited Degree 

In the basic model, we limit connectivity by range for 
mathematical tractability, but in practice, most P2P systems 
use a limitation based on the number of connections per peer. 

However, degree limited connectivity can be linked to our 
model. Consider that a ALTO-like mechanism allows each peer 
to connect to its L nearest peers. If L is high enough, it will 
be identified to Nf and the behavior will be fluid. The degree 
connectivity can then be approximated by a range connectivity 
such that L, R and /3 verify 



ttR^(3 = L. 



(39) 



Using (|7| and pT] ), we get an equation that /3 must verify: 



p^^iP) = AF, 



(40) 



where 7(/3) is the strength of the rate function / when using 



(34) R 



(see for instance Table 



III 



Tr0 

Once j3 is known, we deduce W ~ j. For instance, using 



the rate function of the basic model, one gets 

2 1 

1 



W^ 



F 
2C 



ttXL 



(41) 



We observe that the super-scalability property still exists 
(although slightly diminished), despite the fact that each peer 
has a limited number of neighbors. This is a consequence of 
having a decreasing / function: as the arrival rate increases, so 
does the density and thus the rate of individual connections. 
To compare with, a system with a constant rate function like in 
the toy example is simply scalable if the degree connectivity 



is limited (the latency is obviously W 



LU 



Finally, we can propose a fluid model that encompasses 
both the range and degree models. Consider that there is 
a function p{r, /?) that describes the probability that a peer 
connects to another one given that their distance is r and the 
density is /?. 



(42) 



The equation to solve is still ( |40| l, except that we now 
define 

-f{f3) = f 2TTrf{r)p{r,f3)dr. 

Jr>Q 



Under this formalism, the range model is simply p{r, j3) 
^r<R, while the degree limited model corresponds to p{r, /3) 
1 /-j^. For these two cases, the function p corresponds to 



very simple overlays, but it could be used to model more 
complex structures like random geometric graphs. 

VIII. Conclusion 

In a P2P system with a rate function / and a range R, the 
following general law quantifying P2P super-scalability was 
identified: the stationary latency is of the form 



Wr.^M 






(43) 



with 7 = 27r J„ f{r)dr and with M{x) a function which is 
larger than 1 and tends to 1 when x tends to infinity. In the 
TCP case, the function x — > M{x) is decreasing and hence 
reinforces super-scalability. 



The conditions for the super-scalability formula ( [43| ) to 
hold were also identified: (1) The number of chunks should 
be large (so as to be in the fluid regime w.r.t. chunks); (2) 
The parameter Nf = nR'^y^XF/j should be large (so as to 
be in the fluid regime w.rt. peers). If (1) or (2) do not hold, 
then chunk/peer availability issues will dominate and the model 
breaks down; (3) the network should have the capacity to cope 
with the P2P ti'affic, i.e. 



EV9> 



2XF 



7 



r^f{r)dr, 



(44) 



where 6 is the spatial intensity of routers and E the typical link 
capacity. Hence the capacity of the network should scale like 
A if other parameters are unchanged. If this condition does not 
hold, the network cannot cope with the traffic and the model 
breaks down; (4) The access should not be the bottleneck, 
which translates into the requirement 



[/> v/Ai^, 



(45) 



where U denotes the (total) upload capacity of each peer. In 
other words, the latter should scale like -\/A- If this is not the 
case, then classical access bottleneck model should be used. 
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